Comunication-Efficient Algorithms for Statistical Optimization
نویسندگان
چکیده
We analyze two communication-efficient algorithms for distributed optimization in statistical settings involving large-scale data sets. The first algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error (MSE) that decays as O(N−1 + (N/m)). Whenever m ≤ √ N , this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as O(N−1 + (N/m)), and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as O(N−1+(N/m)−3/2), easing computation at the expense of a potentially slower MSE rate. We also provide an experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with N ≈ 2.4× 10 samples and d ≈ 740,000 covariates.
منابع مشابه
Testing Soccer League Competition Algorithm in Comparison with Ten Popular Meta-heuristic Algorithms for Sizing Optimization of Truss Structures
Recently, many meta-heuristic algorithms are proposed for optimization of various problems. Some of them originally are presented for continuous optimization problems and some others are just applicable for discrete ones. In the literature, sizing optimization of truss structures is one of the discrete optimization problems which is solved by many meta-heuristic algorithms. In this paper, in or...
متن کاملCOMPUTATIONALLY EFFICIENT OPTIMUM DESIGN OF LARGE SCALE STEEL FRAMES
Computational cost of metaheuristic based optimum design algorithms grows excessively with structure size. This results in computational inefficiency of modern metaheuristic algorithms in tackling optimum design problems of large scale structural systems. This paper attempts to provide a computationally efficient optimization tool for optimum design of large scale steel frame structures to AISC...
متن کاملFinding the Shortest Hamiltonian Path for Iranian Cities Using Hybrid Simulated Annealing and Ant Colony Optimization Algorithms
The traveling salesman problem is a well-known and important combinatorial optimization problem. The goal of this problem is to find the shortest Hamiltonian path that visits each city in a given list exactly once and then returns to the starting city. In this paper, for the first time, the shortest Hamiltonian path is achieved for 1071 Iranian cities. For solving this large-scale problem, tw...
متن کاملEfficient Data Mining with Evolutionary Algorithms for Cloud Computing Application
With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...
متن کاملA Comparative Study of Four Evolutionary Algorithms for Economic and Economic-Statistical Designs of MEWMA Control Charts
The multivariate exponentially weighted moving average (MEWMA) control chart is one of the best statistical control chart that are usually used to detect simultaneous small deviations on the mean of more than one cross-correlated quality characteristics. The economic design of MEWMA control charts involves solving a combinatorial optimization model that is composed of a nonlinear cost function ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1209.4129 شماره
صفحات -
تاریخ انتشار 2012